A Hybrid Clustering Criterion for R*-Tree on Business Data
نویسندگان
چکیده
It is well-known that multidimensional indices are efficient to improve the query performance on relational data. As one successful multi-dimensional index structure, R*-tree, a famous member of the R-tree family, is very popular. The clustering pattern of the objects (i.e., tuples in relational tables) among R*-tree leaf nodes is one of the deceive factors on performance of range queries, a popular kind of queries on business data. Then, how is the clustering pattern formed? In this paper, we point out that the insert algorithm of R*tree, especially, its clustering criterion of choosing subtrees for new coming objects, determines the clustering pattern of the tuples among the leaf nodes. According to our discussion and observations, it becomes clear that the present clustering criterion of R*-tree can not lead to a good clustering pattern of tuples when R*-tree is applied to business data, which greatly degrades query performance. After that, a hybrid clustering criterion for the insert algorithm of R*-tree is introduced. Our discussion and experiments indicate that query performance of R*-tree on business data is improved clearly by the hybrid criterion.
منابع مشابه
Retaining Customers Using Clustering and Association Rules in Insurance Industry: A Case Study
This study clusters customers and finds the characteristics of different groups in a life insurance company in order to find a way for prediction of customer behavior based on payment. The approach is to use clustering and association rules based on CRISP-DM methodology in data mining. The researcher could classify customers of each policy in three different clusters, using association rules. A...
متن کاملTabu-KM: A Hybrid Clustering Algorithm Based on Tabu Search Approach
The clustering problem under the criterion of minimum sum of squares is a non-convex and non-linear program, which possesses many locally optimal values, resulting that its solution often falls into these trap and therefore cannot converge to global optima solution. In this paper, an efficient hybrid optimization algorithm is developed for solving this problem, called Tabu-KM. It gathers the ...
متن کاملHybrid Algorithm for Noise-free High Density Clusters with Self-Detection of Best Number of Clusters
Clustering is a process of discovering group of objects such that the objects of the same group are similar, and objects belonging to different groups are dissimilar. A number of clustering algorithms exist that can solve the problem of clustering, but most of them are very sensitive to their input parameters. Minimum Spanning Tree clustering algorithm is capable of detecting clusters with irre...
متن کاملA Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کامل